The ICSI Meeting Recorder Dialog Act (MRDA) Corpus
نویسندگان
چکیده
We describe a new corpus of over 180,000 handannotated dialog act tags and accompanying adjacency pair annotations for roughly 72 hours of speech from 75 naturally-occurring meetings. We provide a brief summary of the annotation system and labeling procedure, inter-annotator reliability statistics, overall distributional statistics, a description of auxiliary files distributed with the corpus, and information on how to obtain the data.
منابع مشابه
Joint segmentation and classification of dialog acts using conditional random fields
This paper investigates the use of conditional random fields for joint segmentation and classification of dialog acts exploiting both word and prosodic features that are directly available from a speech recognizer. To validate the approach experiments are conducted with two different sets of dialog act types under both reference and speech to text conditions. Although the proposed framework is ...
متن کاملMeeting acts: a labeling system for group interaction in meetings
We describe a new system for labeling speech corpora with high-level group interaction tags, called “meeting acts.” The system was motivated by a need to assess work seeking to automatically detect meeting style using dialog act information. We present information about the relationships seen between dialog act sequences and meeting style to motivate the labeling process. We provide a summary o...
متن کاملTowards a Decent Recognition Rate for the Automatic Classification of a Multidimensional Dialogue Act Tagset
In this paper, we present some thoughts and examinations on statistical dialogue act annotation using multidimensional dialogue act labels, based on the ICSI meeting corpus and the associated MRDA tag set. We show some statistics of this corpus, and preliminary results of a statistical tagger for the dialogue act labels, together with a proposal for a more realistic interpretation of these resu...
متن کاملA new Metric for the Evaluation of Dialog Act Classification∗
The standard evaluation metrics for dialog act classifiers are based on the boolean outcome of the exact classification. For multidimensional tag sets, such as the ICSI-MRDA tag set, this is stricter than necessary, since the missclassification might be partial and this can be good enough for the application in which the classifier is embedded. We propose a new forgiving metric and show some pr...
متن کاملBackoff Model Training using Partially Observed Data: Application to Dialog Act Tagging
Dialog act (DA) tags are useful for many applications in natural language processing and automatic speech recognition. In this work, we introduce hidden backoff models (HBMs) where a large generalized backoff model is trained, using an embedded expectation-maximization (EM) procedure, on data that is partially observed. We use HBMs as word models conditioned on both DAs and (hidden) DAsegments....
متن کامل